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(57) Abstract 

Serial analysis of gene expression (SAGE) allows for a quantitative, representative,' and comprehensive profile of gene expression. 
We have utilized SAGE technology to contrast the differential gene expression profile in rat embryo fribroblast cells producing 
temperature-sensitive p53 tumor suppressor protein at permissive or non-permissive temperatures. Analysis of ~ 15,000 genes revealed that 
the expression of 14 genes (p<0.001, >0.03 % abundance) was dependent on functional p53 protein, whereas the expression of 3 genes was 
significantly higher in cells producing non-functional p53 protein. Those genes whose expression was increased by functional p53 include 
RAS, U6 snRNA, cyclin G, EGR-1, and several novel genes. The expression of actin, tubulin, and HSP70 genes was elevated at the 
non-permissive temperature for p53 function. Interestingly, the expression of several genes was dependent on a non-temperature-sensitive 
mutant p53 suggesting altered transcription profiles dependent on specific p53 mutant proteins. 
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PCT/US98/13903 



P53 INFLUENCED GENE EXPRESSION 



TECHNIC. AT, FTFJJ) OF THE INVENTION 

This invention is related to genes and proteins involved in cell cycle control 
and tumorigenesis. 

BACKGROIINIIIMIJIIE, TNVENTTON 

Transcriptional regulation mediated by the p53 tumor suppressor gene 
product is implicated in numerous cell regulatory cascades, most prominently cell 
growth regulation (White, 1996; Ko & Prives, 1996). While many genes have been 
shown to be transcriptionally regulated by p53 either directly (MDM2, p21 WAF1/CIP1 , 
cyclin G, GADD45, IGFBP3, BAX, IGF-IR) or indirectly (Thrombospondin-1, 
TGF-a , PCNA, c-fos, c-jun, HSP70) (see Ko & Prives, 1996 for review), the 
cellular context required for specific p53-mediated transcriptional regulation remains 
ill-defined. 

Rat embryo fibroblast (REF) cells transformed with activated RAS and a 
mouse temperature-sensitive p53 (Vall35) gene constitute a tightly regulated, well 
defined system to study mechanisms involved in p53-mediated cell growth 
regulation (Michalovitz et al., 1990; Pietenpol et al., 1996; Martinez et al., 1991). 
Transformed REF cells grown at the non-permissive temperature (38°C) maintain 
the p53 protein in a non-functional conformation confined to the cytoplasm of the 
cell. Growth of RAS plus temperature-sensitive p53-transformed REF cells at the 
permissive temperature (32°C) results in the production of functional p53 protein 
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capable of migrating into the nucleus and regulating transcription in a 
sequence-specific manner (Velculescu et aL, 1995; Gannon & Lane, 1991). 
Furthermore, a temperature shift from 38°C to 32 C induces functional 
p53-mediated Gl arrest and apoptosis (Velculescu et al., 1995). Both p 21 WAF1/CIP1 
(el-Deiry et al., 1993; Harper et al., 1993) and cyclin G (Okamoto & Beach, 1994; 
Zauberman et al., 1995) have been shown to be up-regulated in these cells by direct 
p53-dependent transcriptional regulation. Although the function of cyclin G remains 
undefined, p21 WAFVClpl is known to regulate cell growth via direct interaction with 
cyclin-dependent kinases (CDKs) (Harper et al., 1993). Several other genes have 
been purported to be regulated by p53 within this system, but evidence for direct 
p53 transcriptional regulation is lacking (Ko & Prives, 1996). 

Thus there is a continuing need in the art for discovering new genes which 
are regulated by p53 and genes which are related to cell cycle control and 
tumorigenesis. 

STTMMATEV OF TTTF TNVFNTTON 

It is an object of the present invention to provide methods of diagnosing 
cancer in a sample suspected of being neoplastic. 

It is another object of the present invention to provide an isolated and 
purified nucleic acid molecule which is identified by a SAGE tag. 

It is an object of the present invention to provide an isolated nucleotide probe 
comprising at least 12 nucleotides of a rat nucleic acid molecule identified by a 
SAGE tag. 

Another object of the invention is to provide a method for evaluating 
cytotoxicity or carcinogenicity of an agent. 

These and other objects of the invention are achieved by one or more 
embodiments of the invention. In one embodiment, a method is provided for 
diagnosing cancer in a sample suspected of being neoplastic. The method comprises 
the steps of: 

comparing the level of transcription of an RNA transcript in a first 
sample of a first tissue to the level of transcription of the transcript in a second 
sample of a second tissue, wherein the first tissue is suspected of being neoplastic 
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and the second tissue is a normal human tissue, wherein the first and second tissue 
are of the same tissue type, and wherein the transcript is selected from the group 
consisting of Alu, RAS, U6 snRNA,16S RNA, EGR-1, ribosomal protein S27, 
ETS-1, 28S RNA, CGR11, and LIMK-2; 

categorizing the first sample as neoplastic when transcription is found 
to be lower in the first sample than inlhe second sample. 

According to another embodiment of the invention a method is provided for 
diagnosing cancer in a sample suspected of being neoplastic. The method comprises 
the steps of: 

comparing the level of transcription of an RNA transcript in a first 
sample of a first tissue to the level of transcription of the transcript in a second 
sample of a second tissue, wherein the first tissue is suspected of being neoplastic 
and the second tissue is a normal human tissue, wherein the first and second tissue 
are of the same tissue type, and wherein the transcript is identified by a tag selected 
from the group consisting of ribosomal protein L13a, a -tubulin (1), a -tubulin (2), 
thymosin 0-4, and y-actin; 

categorizing the first sample as neoplastic when transcription is found 
to be higher in the first sample than in the second sample. 

According to another aspect of the invention an isolated and purified nucleic 
acid molecule is provided. The nucleic acid molecule comprises a SAGE tag 
selected from the group consisting of SEQ ID NOS: 11-16, 21-23, 25-28, 35-37, and 
39-40. 

In another embodiment of the invention an isolated nucleotide probe is 
provided. The probe comprises at least 12 nucleotides of a rat nucleic acid 
molecule, wherein the rat nucleic acid molecule comprises a SAGE tag selected 
from the group consisting of SEQ ID NOS: 11-16, 21-23, 25-28, 35-37, and 39-40. 
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According to another aspect of the invention a method is provided for 
evaluating cytotoxicity or carcinogenicity of an agent. The method comprises the 
steps of: 

contacting a test agent with a rat cell; 

detennining the level of transcription of a transcript in the rat cell 
after contacting with the agent; wherein an agent which decreases the level of a 
transcript identified by a SAGE tag as shown in SEQ ID NOS: 1-28, or an agent 
which increases the level of a transcript identified by a SAGE tag as shown in SEQ 
ID NOS:29-40 is a potential cytotoxin or carcinogen. 
TYRTFF TTF,S HRTPTTON QT7 TRF. DRAWINGS 
Figure 1. Cumulative total gene representation within the REF SAGE 
analysis. Sequenced SAGE tag (transcripts) accumulation was monitored for 
unique tags (genes) sporadically throughout the analysis using the SAGE 
software package. 

Figure 2. Northern analysis of genes represented in the REF SAGE 
analysis. Poly A* RNA from 32°C, wild-type p53 (+) or 38°C, mutant p53 
(-) REF-Vall35 cells was electrophoresed, blotted, and probed with cDNA 
specific for clones 41 (EF1), 4 (cyclin G), 9, 12, 8 (ribosomal protein S27), 
3 (U6 snRNA) and 14. Tag abundance obtained from the SAGE analysis is 
shown above the lanes. Molecular weight marker migration is depicted on 
the right (Kb, kilobases). 
DETAILED DF.SCRTPTTON 

We describe here the use of serial analysis of gene expression (SAGE) 
(Velculescu et al, 1995) to provide an extensive profile of gene expression in REF 
cells containing non-functional or functional p53. We have identified novel p53 up- 
regulated genes and down-regulated genes previously undetected by EST, 
differential display, or subtractive hybridization technologies. 

The genes which are identified as being upregulated or downregulated by 
p53 can be used to diagnose cancer in a sample. The sample can be a tissue sample 
isolated from a human which is suspected of being neoplastic. The level of 
transcription of an RNA can be determined and compared to the level in a normal 
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tissue, preferably of the same tissue source. Techniques for determining levels of 
transcription of an RNA are well known in the art, and include, without limitation, 
Northern blots, nuclear run-on assays, in vitro transcription assays, primer extension 
assays, quantitative reverse transcriptase-polymerase chain reactions (RT-PCR), and 
hybrid filter binding assays. These techniques are well known in the art. See J.C. 
Alwine, D.J. Kemp, G.R. Stark, ProcNatl Acad. Sci. U.S.A. 74, 5350 (1977); K. 
Zinn, D. Di-Maio, T. Maniatis, Cell 34, 865 (1983); G. Veres, RA Gibbbs, S.E. 
Scherer, C.T. Caskey, Science 237, 415 (1987). 

When a transcript identified herein as being up- or down-regulated by p53 
is found to be either up- or down-regulated in the test sample, then the sample can 
be categorized as neoplastic. Transcripts which are identified herein as being down- 
regulated in the absence of p53 are Alu, RAS, U6 snRNA, 16S RNA, EGR-1, 
ribosomal protein S27, ETS-1, 28S RNA, CGRU, and LIMK-2. Transcripts which 
are identified as being up-regulated in the absence of p53 are ribosomal protein 
L13a, cc-tubulin (1), a-tubulin (2), thymosin P-4, and y-actin. 

To increase the reliability of the determinations, more than one of these 
transcipts can be assayed. Thus the level of at least two, five, six, or ten of the 
transcripts can be determined. Assays involving both up-regulated and down- 
regulated transcripts can be combined. 

Also provided are new transcripts which have been identified herein on the 
basis of their up-or down-regulation in the absence of p53. These transcripts are 
identified by the SAGE tags shown in SEQ ID NOS: 11-16, 21-23, 25-28, 35-37, 
and 39-40. Given the SAGE tags is well within the skill of the art to isolate the 
RNA or cDNA which contains the SAGE tag. Due to the method of isolation of 
SAGE tags, they occur only in the 3" end of transcripts, immediately adjacent to the 
restriction enzyme site of the enzyme which was used to generate the SAGE tags. 
Thus hybridization under stringent conditions to cDNA libraries can be used to 
select larger cDNAs which contain the SAGE tag. While the SAGE tags isolated 
herein are derived from rat cells, similar sequences from other mammals, including 
humans, can be obtained using hybridization with the SAGE tags themselves, or 
using other portions of the rat genes identified by the SAGE tags. 
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Probes comprising at least 10, 12, 14, 16, 18, 20, 25, or 30 contiguous 
nucleotides of the cDNA identified by the SAGE tags can be used as probes. The 
probe may or may not contain the sequence of the SAGE tag. The probes can be 
labeled, as is known in the art, using for example, radiolabels, fluorescent labels, 
or enzymatic labels. These probes can be used to identify and isolate the 
homologues from related mammalian species. The probes can also be used to 
directly assay in humans or other mammals for up-or down-regulation in a sample 
of the corresponding transcript. Such regulation can be used as an indicator of 
neoplasia, as discussed above. 

Combinations of probes can be provided in single or multiple vials as 
reagents for evaluating toxicity or carcinogencity of test compounds. The reagents 
can be provided in a kit, which optionally contains instructions for performing the 
assays, buffers, growth media, cells, detection reagents. In order to test 
compounds using such probes, the level of transcription of a transcript in a rat cell 
is determined after the rat cell has been contacted with the test compound. The 
effect of the test compound on transcription of the transcript identified by the probe 
is determined. Test compounds which cause changes in the transcript levels which 
mimic the changes caused by loss of p53 are identified as potential cototoxins or 
carcinogens. 

We have applied SAGE technology toward the generation of growth 
regulatory transcript profiles from REF cells containing either functional or 
non-functional p53 protein. The profile includes over 15,000 genes derived from 
mutant and wild-type p53 cDNA populations. Statistically significant transcript 
differences between the two populations overwhelmingly favors the likelihood of a 
preferential transcriptional induction within functional p53 cell populations. This 
suggests that active transcription-dependent expression changes are responsible for 
growth arrest and/or apoptosis in this system. The number of apparently induced 
genes in the REF cells containing functional p53 might be an under-estimate since 
these cells have an intrinsically lower metabolic rate when grown at the lower 
permissive temperature, potentially slowing the accumulation of induced transcripts. 
Genes (actin and tubulin) required for growth were the ones with the most 
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disproportionately higher expression at the non-permissive temperature for p53. 
HSP70 also showed significantly higher abundance in cells lacking functional p53 
protein, consistent with reports by others that it is down-regulated by wild-type p53 
(Agoffetal., 1993). 

Genes anticipated to be induced in the presence of functional p53 protein 
(cyclin G, p21 WAF1/cm , MDM2, BAX-1; Ko & Prives, 1996) did show different 
numbers of SAGE tags in the two libraries, however, a few anomalies where 
observed. First, the p21 w/jn/CTPL transcript showed ~5-6 fold lower abundance than 
expected due to the proximity of a site for one of the restriction enzymes used in the 
generation of the SAGE libraries (see examples, below). Second, as noted in Table 
2, one of the tags overexpressed in the 32°C REF cDNA library represents an 
internal cyclin G sequence. Since SAGE analysis relies on the 3' -most 4-base 
restriction endonuclease site for gene identification, the presence of two cyclin G 
tags (one 3' and one internal) suggests that either internal oligo(dT)-priming is 
occurring within the cyclin G transcript or there exists a second, previously 
unidentified, cyclin G transcript. Because sequences associated with the numerous 
other restriction sites in the cyclin G gene were not observed, it is unlikely that 
partial digestion of the cDNA can explain the "internal" tag. As can be seen in 
Figure 2, clone 4 (cyclin G) hybridizes to an RNA of the expected size and to a 
smaller ("1.2 kb) transcript. It is likely that this transcript represents an alternative 
form of cyclin G RNA that gives rise to the apparently "internal" SAGE tag. The 
combined abundance of the cyclin G transcripts is ~0-.34% of the total cDNA. 
While the significance of the two highly expressed tags and function of cyclin G 
remain undefined, such high levels of expression suggests cyclin G plays a major 
role in regulating cell growth of wild-type p53-containing REF cells. Finally, the 
presence of rRNA sequence tags results from the likely incomplete separation of 
mRNA from the rRNA population during library generation. Subsequent 
oligo(dT)-priming of dA-rich regions within specific rRNA's would result in 
rRNA-specific tags within the SAGE library. 

Several potentially novel growth regulatory genes have been identified with 
this SAGE analysis including genes expressed to levels as high as 0.09% (clone 9) 
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of the induced mRNA population. All previously identified genes with the 
exception of U6 snRNA and LIMK-2 were previously found to be up-regulated in 
a p53-dependent manner including cyclin G (Okamoto & Beach, 1994; Zauberman 
etal., 1995), CGR11 (Madden et al., 1996), and EGR-1 and RAS (B. Vogelstein, 
personal communication). Whereas the RAS tag that was identified matches 
perfectly with the exogenous human RAS homologue, we cannot definitively 
exclude the rat RAS homologue as being responsible for the elevated tag levels since 
complete sequence information is not available for the latter. 

The differential expression observed for U6 snRNA raises some interesting 
questions. Theoretically, the detection of this small RNA molecule should not be 
possible with oligo(dT)-dependent priming. While 3' modifications have been 
shown to occur to the U6 snRNA molecule, these modifications do not include base 
additions that facilitate oligo-dT priming (Lund & Dahlberg, 1992). One possible 
explanation for the observed differential detection of U6 snRNA is that upon 
apoptotic nuclear breakdown the U6 snRNA is liberated from the nucleus and 
fortuitously polyadenylated. No other snRNA species were detected in the SAGE 
analysis. 

The EGR-1 transcription factor accounts for 0.1% of REF mRNA at 32°C. 
This well characterized transcriptional activator and repressor has been shown to be 
regulated in response to a wide array of growth regulatory stimuli, initially being 
described as an early growth response gene activated by serum (Sukhatme et al., 
1988; Cao et al., 1990). Initial studies on EGR-1 appeared to correlate expression 
with enhanced cellular proliferation, however, more recently a role of EGR1 in 
cellular differentiation has been proposed (Bains, 1996), similar to the role proposed 
for p 21 WAFl/CIP1 in myogenic differentiation (Liu et al., 1996). It is tempting to 
speculate that the EGR-1 induction observed in wild-type p53 REF cells stems from 
the triggering of molecular mechanisms similar to differentiation. Further, as some 
of the wild-type p53-containing REF cells are undergoing apoptosis, the 
identification of elevated levels of EGR-1 in these cells may indicate a more 
prominent role for EGR-1 in programmed cell death than previously appreciated. 
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SAGE analysis of REF-Phel32 control cells identified a small number of 
genes whose expression was apparently elevated in the REF-Phel32 mRNA 
population with respect to both 32°C and 38 C-maintained REF-Vall35 ceUs 
(galectin-1 [Perillo et al., 1995], MTS1 [Ambartsumian et al., 1995], TRPM-2 
[apolipoprotein J/clusterin] (Wright et al., 1996), osteopontin [Oates et al., 1996]). 
We have not confirmed these differences by other analyses. REF-Phel32-specific 
regulated genes could potentially represent unique transcriptional regulation 
dependent on specific p53 mutant proteins. Experiments by other investigators 
(Chen et al., 1994; Friedlander et al., 1996; Ludwig et al., 1996) have yielded 
results suggesting that specific p53 mutants interact with unique proteins hinting that 
p53 mutant proteins might retain transcription potential, but for genes not normally 
regulated by p53. It remains possible, however, that the observed differences result 
from leakiness of the p53 temperature-sensitive protein. That is, a subpopulation 
of "active" temperature i sensitive p53 protein at the non-permissive temperature 
might result in the selective down-regulation of genes subsequently observed to be 
preferentially expressed in the control mutant p53 population. It is noteworthy that 
we found significantly more divergence in the abundance of many SAGE tags when 
comparing REF-Phel32 versus 38°C REF-Vall35 than when comparing 
REF-Vall35 32°C versus 38°C. The former comparisons have been used previously 
for the identification of potential p53-regulated genes by differential display (Amson 
et al., 1996). Regardless, it is interesting that each of the known genes showing 
biased expression within REF-Phel32 cells has been linked to cellular growth 
regulation (Ambartsumian et al., 1995; And et al., 1996; Chambers, 1995; Guo et 
al., 1995; Perillo et al., 1995; Wright et al., 1996; Oates et al., 1996). Indeed, 
osteopontin was shown recently to be a metastasis-related factor in mammary tumors 
(Oates etal., 1996). 

Although the results presented have provided a quantitative overview of 
potential growth-regulatory transcripts dependent on p53, the relatively limited 
public database of rat sequences compared to the human database precludes 
immediate identification of some of the differential tags identified. We have 
previously performed a differential display analysis of this REF p53 regulatory 



9 



WO 99/01581 



PCT/US98/13903 



system (Madden et al., 1996). Growth regulatory genes CGR11 and CGR19 were 
isolated, however the genes were found by random sampling. The isolation of these 
genes by differential display yielded little information regarding transcript 
abundance or relative importance to other p53-regulated genes. Identification of 
CGR11 and CGR19 in the current SAGE analysis demonstrates that while both were 
highly induced and moderately abundant, many other unknown genes shared similar 
characteristics that were not identified by differential display. Equally important, 
numerous genes appeared to be differentially expressed by differential display, but 
these apparent differences could not be substantiated by other criteria (e.g. , northern 
analysis). 

Correlations of tag number with gene number suggest that many more than 
the identified 15,000 genes are expressed in REF cells. Indeed, recent SAGE 
analysis of "60,000 transcripts in the yeast Saccharomyces cerevisiae resulted in the 
identification of nearly all of the anticipated 6,000 yeast genes (Velculescu et al., 
1997). Estimates for expressed genes range from 10,000 to 50,000 unique mRNA's 
in a given cell type (Bains, 1996). Our REF SAGE analysis provides statistical 
confidence for all differential genes expressed at ^.0.03 % . We expect that many 
other significant, differentially expressed genes will be revealed upon generation of 
further SAGE tags. The SAGE results presented here already provide a unique 
comparison in the transcript expression profile between cells harboring functional 
vs. non-functional p53 protein. Such broadly inclusive gene expression profiles are 
ultimately necessary and desirable for a thorough understanding of fundamental 
cellular processes such as growth regulation. 

The above disclosure generally describes the present invention. A more 
complete understanding can be obtained by reference to the following specific 
examples which are provided herein for purposes of illustration only, and are not 
intended to limit the scope of the invention. 

EXAMPLE 1 

Differential transcript profiles from temperature-sensitive p53 REF cells. 
Primary REF cells transformed with Ha-RAS and temperature-sensitive p53 (mouse 
Val-135) (REF-Vall35) display growth arrest and apoptotic phenotypes as early as 
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4 hours after shift from the non-permissive to the permissive temperature (data not 
shown, see Ginsberg et al., 1991; Michalovitz et al., 1990; Madden et al., 1996). 
Control REF cells transformed with Ha-RAS and non-temperature-sensitive mutant 
p53 REF cells (REF-Phel32) do not exhibit growth arrest or apoptosis at the 
permissive temperature (32°C) (Ginsberg et al., 1991; Michalovitz et al., 1990). 
To provide a differential transcript- profile inclusive of early p53-dependent 
transcriptional regulation, we harvested REF-Vall35 cell mRNA 8 hours after 
shifting cells to the permissive temperature. Both cyclin G and p21 WAF1/c^,, display 
strong transcriptional induction at this time, whereas little or no transcript is present 
in RNA harvested from REF-Vall35 cells maintained at 38°C (non-permissive 
temperature) or from control REF-Phel32 cells shifted to 32°C for 8 hours (data not 
shown, see Madden et al., 1996). 

EXAMELE 2 

Generation of SAGE and cDNA Libraries. Rat embryo fibroblast cells 
REF-Vall35 (temperature-sensitive) and REF-Phel32 (provided by B. Vogelstein 
and M. Oren, see Ginsberg et al., 1991; Michalovitz et al., 1990) were maintained 
in DMEM containing 10% fetal bovine serum in 5% C0 2 at either 32°C or 38°C. 
Cells were trypsinized and replated at least 48 hours prior to any temperature shift. 
Temperature shifts were made by transfer of subconfluent flasks to pre-equilibrated 
incubators without media changes. RNA was harvested 8 hours after shift to 32°C. 
Total RNA was isolated by direct lysis in RNAzol (Tel-Test, Inc.). PolyA + RNA 
was isolated using the MessageMaker kit (Gibco/BRL) according to the 
manufacturer's instructions. SAGE libraries were generated using 2.5 ug polyA + 
RNA and the restriction enzyme Nlaffl as described (Velculescu et al. , 1995) except 
that the concatamers were cloned into Sphl-digested pZErO-1 (InVitrogen). cDNA 
libraries were constructed using a AZapExpress system (Stratagene) according to the 
manufacturer's instructions. Hybridizations to X clones were performed using either 
a 14 base or 15 base oligonucleotide end-labeled with 32 P (Velculescu et al., 1995). 
Some clones were obtained using the GeneTrapper kit (Gibco/BRL) and 15 base pair 
oligomers derived from the SAGE tag. 
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Plasmids. Rat clones for p21 WAF,/CIP and MDM2 (provided by B. 
Vogelstein) were sequenced to determine respective SAGE tags. The 3'-most Nlam 
sites are CATG/TATTTTGGTC and CATG/ATTTAGCAGT for p 21 WAF1/CIP1 and 
MDM2, respectively. The rat p21 WAF1/cn> sequence which juxtaposes the Nlam and 
BsmFI sites is 5'-CATGTATTTTGGTCCC-3'. Adaptor hgation to NlalH-digested 
rat p 21 WAF1/CIP1 generates a second BsmFI site 5' to the endogenous BsmFI site 
ultimately resulting in a reduction of the P 21 WAF1/ap tags observed in the SAGE 
library. 

DNA Sequencing and Sequence Analysis. All DNA sequencing was 
performed with an ABI 377 (Applied Biosystems) automated DNA sequencer. 
SAGE clones were sequenced by first PCR amplifying pZErO-1 inserts with M13 
forward and reverse primers followed by sequencing with Taq FS (Perkin Elmer) 
M13 -20 dye primer ready reaction mix. SAGE sequences were extracted and 
analyzed using the GenBank database (95.0) (containing 3450 and 224 rat mRNA 
and EST sequences, respectively) and the SAGE program software package. All 
other database analyses utilized the Wisconsin GCG package software programs 
(GenBank version 95.0). Statistical significance between samples was calculated 
using the equation: 

(N 1 -kN 1 U2 )-(N 2 + kN 2 1 ' 2 ), 
where N, and N 2 represent the larger and smaller of the two numbers, respectively, 
and k is the degree of confidence; p=0.05 (k=1.96), p=0.01 (k=2.58), and 
p=0.001 (k=3.29). Positive values derived from the equation were deemed 
statistically significant at the respective confidence intervals. 

SAGE tag abundance and differential expression. Data for 
more than 30,000 transcripts from each of 32°C and 38°C REF-Vall35 
cDNA were obtained by automated sequencing. An additional 10,519 
transcripts analyzed from the control REF-Phel32 cells maintained at 
32°C. A summary of transcript abundance and corresponding gene 
representation is provided in Table 1. 
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Sequences from each of the Vall35 cDNA libraries represented more than 9,000 
different genes, with about 3,000 being represented more than once in either library. 
Combined analysis of transcripts from both 32°C and 38" C REF-Vall35 cDNA 
identified more than 15,000 different genes with more than 5,000 genes represented 
more than one time. Figure 1 details the increase in gene representation as the 
number of SAGE transcripts sequenced increases, demonstrating that many new 
transcripts are still being identified after 60,000 tags were sequenced. The 
transcripts identified from control REF-Phel32 cDNA (> 10,000 SAGE tags) 
represent more than 5,000 genes with about 1,200 genes represented more than one 
time. Comparative analysis of 30,000 transcripts between 32°C and 38°C cDNA 
populations yielded 28 (p<0.01) and 14 (p<0.001) genes significantly up-regulated 
in cells expressing functional p53 protein (32°C) for transcripts present at an 
abundance level ^ 0.03 % . In contrast, the mutant temperature-sensitive p53 cDNA 
(38°C) population yielded only 12 (p<0.01) and 3 (p<0.001) genes differentially 
induced by comparison with the 32°C cDNA population. Twenty-two and 13 
additional differential transcripts are apparent if statistical significance is relaxed to 
p<0.05 for elevated expression levels at 32°C and 38 C, respectively 
(0.02%-0.03% abundance). A summary of tags present at elevated levels (p<0.01) 
at 32°C and 38°C is presented in Table 2. 
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Tahlo. 7 Differential fag ahunHanra in KTRF SAfiK lihrariftS. 

n-DNF. 2TC 38°C TAG ahnndancf^gene* 1 

REF-Vall35 functional p53 up-regulated: 



1 


22fi 


422 


GTGGCTCACA 


2.62% 


Alu 


2 


2Sfi 


25 


GAGGTGCCGG 


0.95% 


RAS 


3 


132 


1 


GCCCCTGCGC 


0.64% 


U6 snRNA 


4 


54 


2 


CTJTGGGTAC 


0.18% 


3' cyclin G 


5 


43 


2 


G GTT AGTTG G 


0.16% 


internal cyclin G 


6 


39 


10 


AATCAACCCG 


0.13% 


16S rRNA 


7 


11 


1 


GGATATGTGG 


0.10% 


EGR-1 


8 


28 


6 


CTCAGACAGT 


0.09% 


ribos. prot. S27 


9 


2fi 


4 


GGCCTGGCTA 


0.09% 


EST105829' 


10 


25 


2 


GTGCTTGTGC 


0.08% 


ETS-1 


11 


25 


4 


TGCGGCCTCC 


0.08% 


NM 


12 


23 


4 


GTCCAGAGAC 


0.08% 


NM< 


13 


21 


Q 


CCACACCCTG 


0.07% 


NM** 


14 


IS 


D 


AGTGTCCTGG 


0.05% 


NM' 


15 


1A 


0 


GAGATCAGTT 


0.05% 


NM 


16 


13 


1 


GAAGCTAATA 


0.04% 


NM 


18 


11 


Q 


GGTCAGTCGG 


0.04% 


28S rRNA 


19 


12 


0 


ACCTTGGAGG 


0.04% 


NM 


20 


12 


Q 


GGTATGGTGG 


0.04% 


CGR11 


21 


10 


0 


ATTGGCTGGG 


0.03% 


NM 


22 


10 


0 


GCCCTGCGCA 


0.03% 


NM 


23 


10 


0 


GGACTTTGTT 


0.03% 


NM 


24 


9 


0 


AGGCAGACTA 


0.03% 


LIMK-2 


25 


9 


0 


CAGGCTTCGT 


0.03% 


NM 


26 


9 


0 


CTGGGTTGGC 


0.03% 


NM 


27 


9 


0 


GCAGTCATCT 


0.03% 


NM 


28 


9 


0 


TTGACTCTTA 


0.03% 


NM 



REF-VaI135 functional p53 down-regulated: 



AGGTCGGGTG 
ACGTCTCAAA 
TTGGTGAAGG 
GGTTGTTACT 
GCTGCCCTAG 
GAATAATAAA 



0.76% ribos. prot. L13a 

0.34% o-tubulin(l) 

0.28% thymosin p-4 

0.26% Y-actin 

0.25% n-tubulin(2) 

0.21% HSP70 
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0.13% 
0.05% 
0.03% 
0.03% 
0.03% 
0.03% 



Miscellaneous tag abundance: 



41 


327 


396 


AGGCAGACAG 


1.36% 


EF1 


42 


107 


92 


GCCTCCAAGG 


0.36% 


GAPDH 


43 


46 


32 


CTACAGAGGA 


0.15% 


p53 (exogenous 
mouse) 


44 




8 


GCAGACAGTG 


0.06% 


BAX(mouse) 


45 




13 


GTGGCTGCTG 


0.05% 


cyolin Dl 


46 




9 


CAAACTGCAT 


0.03% 


CDK4 


47 




2 


ATTTAGCAGT 


0.02% 


MDM2 


48 




0 


TATTTTGGTC 


0.02% 


WAF1 


49 




0 


GATGACGGGA 


0.02% 


CGR19 


50 




0 


ATGACTCGTG 


0.02% 


NM** 



* Underlines represent tags showing differentials with confidence p< 0.001. 

* Percent abundance when induced. 

* Mouse EST matches include: clone 12=MUSGS00660, clone 39=MUSGS00835 
'NM; no match 

1 Represent genes for which cDNA and open reading frames have been obtained 

•♦Clones identified by subtractive hybridization (B. Vogelstein and S. Zhou, personal communication) 
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The three 38°C elevated tags represent y-actin (0.26%), a-tubulin (1 isoform, 
0.25%), and HSP70 (0.21%) consistent with continuing growth of the 
- 38°C-maintained cells. The remaining 38°C-elevated tags (p < 0.01) include a second 
tubulin isoform, Thymosin P-4, ribosomal protein L13a, and several undefined genes. 
Interestingly, CDK4 also showed a substantial differential induction in 38°C-induced 
cDNA (9 occurrences at 38°C and 1 occurrence at 32°C, /?<0.05). 

The 14 genes expressed at elevated levels at 32°C (p< 0.001) include an alu 
repetitive tag (2.62%), RAS (0.95%), U6 snRNA (0.61%), 2 cyclin G tags (0.18% 
and 0.16%), EGR-1 [Zif268/NGFI-A/Krox-24] (0.10%), external transcribed 
spacer-1 (ETS-1) (0.08%), a clone previously identified by subtractive hybridization 
(clone 13; B. Vogelstein and S. Zhou, personal communication) (0.07%), 28S rRNA 
(0.04%), CGR11 (0.04%), and 3 uncharacterized genes (clones 9,12,14; Table 2). 
The genes corresponding to both cyclin G tags, U6 snRNA, and ETS-1 tags were 
cloned and verified to be expressed at the predicted SAGE abundance level (see 
below). The p 2l WAF1/CIP1 tag was present at 5 copies in the 32°C cDNA and 0 copies 
in the 38°C cDNA. This apparent low abundance of the p 21 WAF1/CIP1 tag (0.04% vs. 
0.20% expected) is apparently due to the presence of a site for the restriction 
endonuclease BsmFI within the p2 i WAF1/CIP 1 C DNA that overlaps the SAGE tag site. 
Other genes showing elevated expression at 32°C include BAX1 (18:8), MDM2 (7:2), 
CGR19 (5:0), and another clone previously identified by subtractive hybridization 
(clone 50; B. Vogelstein and S. Zhou, personal communication) (5:0). One measure 
of reproducibility of the transcript profiles involves comparison of transcript levels for 
genes encoding ribosomal proteins and known housekeeping genes. As shown in 
Table 2 for EF1 (327:396) and GAPDH (107:92), well-known housekeeping genes 
are expressed at comparable levels in both samples. Most ribosomal proteins were 
also present at similar levels in 32°C and 38°C cDNA (data not shown). As expected, 
exogenous p53 (mouse Vall35) also showed similar abundance in 32°C (46 tags) and 
38°C (32 tags) cDNA populations (Table 2). 

Partial cDNAs were obtained for three previously uncharacterized and 
relatively abundant rat tags at 32°C. The clone 14 partial cDNA yielded an open 
reading frame (ORF) of 210 amino acids showing strong homology within a 
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helix-loop-helix region to the HES transcription factor family (Sasai et al., 1992). 
Clone 12 yielded an ORF of 149 amino acids likely to be the rat homologue of a 
previously identified human tissue specific protein (GenBank accession #X67698) of 
unknown function and clone 9 yielded an ORF of 164 amino acids with no homology 
to published proteins or protein motifs. 

EXAMPLE 3 

SAGE data for control, non-temperature-sensitive REF cells. SAGE 
analysis of the control REF-Phel32 cDNA generated from 32°C-maintained cells was 
performed to greater than 10,000 transcripts. With a few exceptions, the transcript 
profile from these cells resembled that generated from the REF-Vall35 38°C cDNA 
population (data not shown). One unknown gene was absent in the control 
REF-Phel32 cDNA but was expressed at about 0.10% in both the 32°C and 38°C 
REF-Vall35 cDNA populations. Genes encoding galectin-1 (Perillo et al., 1995), 
MTS1 (Ambartsumian et al., 1995), TRPM-2 (apolipoprotein J/clusterin) (Wright et 
al., 1996), osteopontin (Oates et al., 1996), and one unknown gene were expressed 
to significantly higher levels in the control REF-Phel32 cDNA than in either the 32°C 
or 38°C REF-Vall35 cDNA population (data not shown). 

EXAMPLE 4 

Northern blot analysis. RNA analyses were performed using either total or 
polyA + RNA and Ambion's NorthemMax kit. 32 P-labeled cDNA probes were 
generated by random-priming and hybridized and washed according to the 
manufacturer's protocol. Assurance of equivalent RNA loading was achieved either 
by UV shadowing (total RNA) or EF1 probing (poly A + RNA). 

Validation of SAGE transcript representation. Confirmation of transcript 
abundance determined from the SAGE libraries was achieved using EF1 and cyclin 
G probe hybridization to 1 cDNA libraries derived from the same mRNA used for 
SAGE library generation. Both EF1 and cyclin G showed similar abundance in the 
SAGE and 1 cDNA libraries for both the REF-Vall35 32°C and 38°C cDNA 
populations (data not shown). Northern analyses with clone 9, 12 and 14 cDNA as 
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well as probes for U6 snRNA, cyclin G, ribosomal protein S27, and EF1 were 
performed with mRNA derived from 32°C-and 38°C-maintained cells (Figure 2). 
Results show differential induction of all the unknown clones, U6 snRNA, cyclin G 
and ribosomal protein S27 in the 32°C mRNA population. As expected, the EF1 
probe revealed equal abundance in both populations. Thus, all SAGE transcript 
differentials also show similar differential expression by northern analysis, confirming 
representative sampling in the SAGE analysis. 
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CLAIMS 

1. A method of diagnosing cancer in a sample suspected of being neoplastic, 
comprising the steps of: 

comparing the level of transcription of an RNA transcript in a first 
sample of a first tissue to the level of transcription of the transcript in a second sample 
of a second tissue, wherein the first tissue is suspected of being neoplastic and the 
second tissue is a normal human tissue, wherein the first and second tissue are of the 
same tissue type, and wherein the transcript is selected from the group consisting of 
Alu, RAS, U6 snRNA,16S RNA, EGR-1, ribosomal protein S27, ETS-1, 28S RNA, 
CGR11, andLIMK-2; 

categorizing the first sample as neoplastic when transcription is found 
to be lower in the first sample than in the second sample. 

2. A method of diagnosing cancer in a sample suspected of being neoplastic, 
comprising the steps of: 

comparing the level of transcription of an RNA transcript in a first 
sample of a first tissue to the level of transcription of the transcript in a second sample 
of a second tissue, wherein the first tissue is suspected of being neoplastic and the 
second tissue is a normal human tissue, wherein the first and second tissue are of the 
same tissue type, and wherein the transcript is identified by a tag selected from the 
group consisting of ribosomal protein L13a, a -tubulin (1), a -tubulin (2), thymosin 
P-4, and y-actin; 

categorizing the first sample as neoplastic when transcription is found 
to be higher in the first sample than in the second sample. 

3. The method of claim 1 wherein a comparison of at least two of the transcripts 
is performed. 

4. The method of claim 2 wherein a comparison of at least two of the transcripts 
is performed. 
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5. The method of claim 1 wherein a comparison of at least five of the transcripts 
is performed. 

6. The method of claim 2 wherein a comparison of at least five of the transcripts 
is performed. 

7. The method of claim 1 wherein a comparison of at least ten of the transcripts 
is performed. 

8. The method of claim 2 wherein a comparison of at least six of the transcripts 
is performed. 

9. A method of diagnosing cancer in a sample suspected of being neoplastic, 
comprising the steps of: 

comparing the level of transcription of an RNA transcript in a first 
sample of a first tissue to the level of transcription of the transcript in a second sample 
of a second tissue, wherein the first tissue is suspected of being neoplastic and the 
second tissue is a normal human tissue, wherein the first and second tissue are of the 
same tissue type, and wherein the transcript contains an Alu sequence; 

categorizing the first sample as neoplastic when transcription is found 
to be lower in the first sample than in the second sample. 

10. An isolated and purified nucleic acid molecule which comprises a SAGE tag 
selected from the group consisting of SEQ ID NOS: 11-16, 21-23, 25-28, 35-37, and 
39-40. 

11. The nucleic acid molecule of claim 10 which is a cDNA molecule. 

12. The nucleic acid molecule of claim 10 wherein the SAGE tag is located at the 
3' end of the molecule. 
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13. An isolated nucleotide probe comprising at least 12 nucleotides of a rat nucleic 
acid molecule, wherein the rat nucleic acid molecule comprises a SAGE tag selected 
from the group consisting of SEQ ID NOS: 11-16, 21-23, 25-28, 35-37, and 39-40. 

14. The probe of claim 13 which comprises the selected SAGE tag. 

15. A reagent for evaluating toxicity or carcinogenicity of an agent, comprising 
at least 2 probes according to claim 13. 

16. The reagent of claim 15 which comprises at least 5 of said probes. 

17. The reagent of claim 15 which comprises at least 10 of said probes. 

18. The reagent of claim 15 which comprises at least 20 of said probes. 

19. The reagent of claim 15 which comprises at least 30 of said probes. 

20. A reagent for evaluating cytotoxicity or carcinogenicity, comprising at least 
2 probes according to claim 14. 

21. A method for evaluating cytotoxicity or carcinogenicity of an agent, 
comprising the steps of: 

contacting a test agent with a rat cell; 

deterrnining the level of transcription of a transcript in the rat cell after 
contacting with the agent; wherein an agent which decreases the level of a transcript 
identified by a SAGE tag as shown in SEQ ID NOS: 1-28, or an agent which 
increases the level of a transcript identified by a SAGE tag as shown in SEQ ID 
NOS: 29-40 is a potential cytotoxin or carcinogen. 
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